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Abstract — We consider the setting where actions can be used 
to modify a state sequence before compression. The minimum 
rate needed to losslessly describe the optimal modified sequence 
is characterized when the state sequence is either non-causally 
or causally available at the action encoder. The achievability is 
closely related to the optimal channel coding strategy for channel 
with states. We also extend the analysis to the the lossy case. 



I. Introduction 

Consider the standard Shannon-theoretic lossy source cod- 
ing setting where we have a source S n that we wish to perform 
lossy compression on. The encoder receives the source S n 
and produces an index Al that is sent to the decoder. Based 
on the index, the decoder produces a lossy reconstruction, S n , 
such that the per symbol distortion constraint is satisfied. An 
alternative view of this problem is as one where the encoder 
is first required to produce the reconstruction sequence S n 
and then uses a lossless compression algorithm to describe 
S n to the decoder. This point of view on lossy source 
coding, depicted in Figure Q] has been instrumental in recent 
developments of an approach to universal and implementable 
lossy compressors, cf. Ifl4l . Ifl5l and references therein. 

In this paper, we generalize the above setting by asking 
the following question: what if the encoder makes an ' 'error' ' 
in outputting the reconstruction sequence? The encoder may 
wish to take "actions" to output a sequence of reconstruction 
symbols. However, due to noise, the reconstruction symbols at 
the output of the encoder may be different from the intended 
reconstruction symbols. In this case, we are still interested 
in sending the reconstruction sequence (the modified source 
sequence) to the decoder. The question then is, what is the 
optimal rate-distortion tradeoff in such a scenario? As a more 
concrete example, consider lossy compression of a binary 
source S n . With the source as input, the encoder first attempts 
to output the desired reconstruction sequence, but due to errors 
in the circuitry of the encoder, a bit that is meant to be one 
can still be zero with some probability and vice versa. Using 
a universal lossless compression algorithm, we transmit the 
output of the "faulty" encoder, S n , to the decoder. We are 
now interested in the optimum rate-distortion tradeoff under 
the assumption of a "faulty" encoder. 

As another example of our general setting, which may seem 
at first sight to be unrelated to the question we asked above, 
imagine that we have a number of robots working on a factory 
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floor and the positions of all the robots need to be reported to 
a remote location. Letting S represent the position of a robot, 
we would expect to send H(S) bits to the remote location. 
However, what if the robots can take actions to change their 
positions so that they can be more efficiently described? A 
local command center can first give commands (actions) to 
the robots so that they move in a cooperative way into a 
final position sequence that requires fewer bits to describe. 
The command center may face two issues in general: cost 
constraints and uncertainty. A cost constraint occurs because 
each robot should save its power and not move too far away 
from its current location. The uncertainty is a result of the 
robots not moving exactly as instructed by the local command 
center. 

Both examples are instantiations of the problem setting 
illustrated in Fig. [2] (Formal definitions are given in the next 
section). Here, S n is our observed source (or state) sequence. 
We assume a general cost function A(a, s,y) and a general 
relation, specified by a conditional PMF p(y\a, s), relating 
the modified source sequence to be compressed to the original 
source sequence (state) and action taken by the encoder toward 
modifying it. As shown in the preceding examples, we are 
interested in compressing the final output Y n . 
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Fig. 2. Compression with actions. The Action encoder first observes the state 
sequence S n and then generates an action sequence A n . The ith output Y; is 
the output of a channel p(y\a, s) when a = Ai and s = Si. The compressor 
outputs a description of Y", M 6 [1 : 2 nR ], from Y n alone if the side 
information Z n is not available at the compressor. If the side information 
is available, then the compressor generates the description based on Y n and 
Z n . The remote decoder generates Y n based on M and its available side 
information Z n as a reconstruction of Y n . 

Our problem setup is also closely related to the channel 
coding problem when the state information is available at 
the encoder. The case where the state information is causally 
available was first solved by Shannon in |4). When the state 
information is non-causally known at the encoder, the channel 
capacity result was derived in |2] and ]3]. Various interesting 
extensions can be found in ||5|— (5). The difference in our 
approach described here is that we make the output of the 
channel as compressible as possible. Our main results when 
the decoder requires lossless reconstruction are given in sec- 
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tion |nll where we characterize the rate-cost tradeoff function 
for the setting in Fig. [2] We also characterize the rate-cost 
function when S" is only causally known at the action encoder. 
In section HV1 we extend the setting to the lossy case where 
the decoder requires a lossy version of F".We characterize 
the rate-distortion cost function when S n is causally known 
at the action encoder and the side information Z n is available 
at both the compressor and the decoder. For other settings, we 
give achievable schemes for the rate-distortion cost functions. 
We conclude in Section [V] where we mention some possible 
extensions for future consideration. 



II. Definitions 

We give formal definitions for the setups under consider- 
ation in this section. We will follow the notation of |[TT1 . 
Sources (S n ,Z n ) are assumed to be i.i.d.; i.e. (S n ,Z n ) ~ 
U7=iPs,z(s t ,z t ). 



A. Lossless case with no side information at the compressor 

We now give the definitions for the case when the side 
information Z n is not available at the compressor. Referring 
to Figure |2] a (n,2 nR ) code for this setup consists of 

• an action encoding function f a : S n — > A n ; 

• a compression function f c : y n — > M e [1 : 2 nR ]; 



a decoding function fd : [1 : 2 nR ] x Z 
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The average cost of the system is E A(A n , S n , Y n ) = 
k Sr=i E A(Ai Y i)- A rate-cost tuple (R, B) is said to 
be achievable if there exists a sequence of codes such that 

limsupPr(y" ± f d (f c (Y n ),Z n )) = 0, (1) 

n— yoo 

limsupEA(A n ,S*",r™) < B, (2) 

n—>oo 

where A(A n ,S n , Y n ) = £™ =1 A(A l ,S l ,Y i )/n. Given cost B, 
the rate-cost function, R(B), is then the infimum of rates R 
such that (R, B) is achievable. 
BC 

Remark: Suppose that the channel is given by Py\A.s — 
ly=A (where 1( ) is the indicator function) and that the cost 
constraint is given by A(A n , S n ), then we recover the standard 
lossy source coding setting with A n being the reconstruction 
sequence. 

EC 



B. Lossless case when side information is available at the 
compressor 

In the case when side information Z n is available at the 
compressor, the definitions remain mostly the same, with the 
exception that the compression function is now given by 

f c -.y n x Z n ^ M € [1 : 2 nR }. 



C. Lossy case 

In the setting where the decoder requires a lossy version 
of Y n , the definitions remain largely the same, with the 
exception that the probability of error constraint, inequality 
(Q~|l, is replaced by the following distortion constraint. 

n 

limsupEd(y™,f") = limsup- VEd(y,,yO < D. (3) 



A rate R is said to be achievable if there exists a sequence of 
(n, 2 nR ) codes satisfying both the cost constraint (inequality 
[2J and the distortion constraint (inequality Given cost B 
and distortion D, the rate-cost-distortion function, R(B,D), 
is then the infimum of rates R such that the tuple (R,B,D) 
is achievable. 



D. Causal observations of state sequence 

In both the lossless and lossy case, we will also consider 
the setup when the state sequence is only causally known at 
the action encoder. The definitions remain the same, except 
for the action encoding function which is now restricted to the 
following form: For each i £ [1 : n], f a ,i ■ S l — > A. 

III. Lossless case 

In this section, we present our main results for the lossless 
case. For the lossless case, we will only consider the case 
when the side information is no available at the compressor, 
as it will be clear from the results that the presence of side 
information at the compressor does not change the rate-cost 
regions for both the case when S n is causally known, and the 
case when S n is non-causally known. 

Theorem [TJ gives the rate-cost function when the state 
sequence is non-causally available at the action encoder, while 
Theorem|2]gives the rate-cost function when the state sequence 
is causally available. 

A. Lossless, non-causal compression with action 

Theorem 1 (Rate-cost function for lossless, non-causal case) 

The rate-cost function for the compression with action setup 
when state sequence S n is non-causally available at the 
action encoder is given by 

R(B)= min I(V;S\Z) + H(Y\V,Z), 

p(v\s),a=f(s,v):EA(S,A,Y)<B 

(4) 

where the joint distribution is of the form p(z, s,v,a,y) = 
p(z, s)p(v\s)l{f( tS . v ) =a yp(y\a, s). The cardinality of the auxil- 
iary random variable V is upper bounded by \V\ < \S\ +2. 

Remarks 

• Replacing a = f(s, v) by a general distribution p(a\s, v) 
does not decrease the minimum in ©. For any joint 
distribution p(s)p(s\v)p(a\s,v), we can always find a 
random variable W and a function / such that W is inde- 
pendent of S, V and Y, and A = f(V, W, X). Consider 
V = (V, W). The Markov condition V - (A, S) - (Y, Z) 
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still holds. Thus H(Y\V, Z) + I(V; S\Z) is achievable. 
Furthermore, 

I(V';S\Z) + H(Y\V',Z) 
= I(V, W; S\Z) + H(Y\V, W, Z) 
< I(V,W;S\Z) + H(Y\V,Z) 
= I{V;S\Z) + H{Y\V,Z). 

• R{B) is a convex function in B. 

• For each cost function A(s, a, y), we can replace it with 
a new cost function involving only s and a by defining 

A'(s,a) = E[A{S,A,Y)\S = s,A = a]. Note that Y is 

distributed as p(y\s, a) given S = s, A = a. 

BC 

• If we set Py\A.s = Iy=a and Z = 0, then we have 

I(V; S) + H(A\V) = I(V, A; S) - I(A; S\V) + H(A\V) 
= I(V,A;S) 
>I(A;S). 

The rate-cost function then works out to 

R(B) > min I (A; S) 

A(A,S)<B 

for some p(a\s). This recovers the standard lossy source 
coding result with A being the reconstruction alphabet 
and B being the desired distortion. 
EC 

Achievability of Theorem[T]involves an interesting observation 
in the decoding operation, but before proving the theorem, 
we first state a corollary of Theorem [1] the case when side 
information is absent (Z = 0). We will also sketch an 
alternative achievability proof for the corollary, which will 
serve as a contrast to the achievability scheme for Theorem Q] 

Corollary 1 (Side information is absent) If Z = 0, then 
rate-cost function is given by 

R(B)= min I(V; S) + H(Y\V) 

p(v\s),a=f(s,v):EA(S,A,Y)<B 

for some p(s, v, a, y) = p(s)p(v\s)l {f{SyV) = a} p(y\a, s). 

Achievability sketch for Corollary 1 

Code book generation: Fix p(v\s) and f(s,v) and e > 0. 

• Generate 2™( / (' S;1/ ) +e ) v n (l) sequences independently, 
I e [1 : 2™( / ( y; ' s ) +e )], each according to UPv(v%) to 
cover S n . 

• For each V n sequence, the Y n sequences that are jointly 
typical with V" are indexed by 2 < - n ^ H ^ Y ^ + ^ numbers. 

Encoding and Decoding: 

• The action encoder looks for a V n in the code book 
that is jointly typical with S n and generates Ai = 
f(Si,Vi),i = 1, ...,n. 

• The compressor looks for a V n in the codebook that 
is jointly typical with the channel output Y n and sends 
the index of that V n sequence to the decoder. The 
compressor then sends the index of Y n as described in 
the second part of code book generation. 



• The decoder simply uses both indices from the compres- 
sor to reconstruct Y n . 

Using standard typicality arguments, we can show that the 
encoding succeeds with high probability and the probability 
of error can be made arbitrarily small. 

Remark: Note that the index of V n is not necessarily 
equal to V n . That is, the V n codeword chosen by the action 
encoder can be different from the V n codeword chosen by 
the compressor. But this is not an error event since we still 
recover the same Y n even if a different V n codeword was 
used. 

This scheme, however, does not extend to the case when 
side information is available at the decoder. The term 
H(S\Z,V) in Theorem Q] requires us to bin the set of Y n 
sequences according to the side information available at the 
decoder. If we were to extend the above achievability scheme, 
we would bin the set of Y n sequences to 2 n ( H ( Y \ z,v ' +e ^ bins. 
The compressor would find a V" sequence that is jointly 
typical with Y n , send the index to the decoder using a rate 
of I(V; S\Z) + e, and then send the index of the bin which 
contains Y" . The decoder would then look for the unique Y n 
sequence in the bin that is jointly typical with V n and Z n . 
Unfortunately, while the V n codeword is jointly typical with 
Y n with high probability, it is not necessarily jointly typical 
with Z n , since V n may not be equal to V n {V n is jointly 
typical with Z n with high probability as V n is jointly typical 
with S n with high probability and V — S — Z). One could 
try to overcome this problem by insisting that the compressor 
finds the same V n sequence as the action encoder, but this 
requirement imposes additional constraints on the achievable 
rate. 

Instead of requiring that the compressor finds a jointly 
typical V" sequence, we use an alternative approach to prove 
Theorem Q] We simply bin the set of all Y n sequences to 

2 n(I(V-S\Z)+H(Y\Z,V)+c) bmfj mA sen( j the bin index tQ me 

decoder. The decoder looks for the unique Y n sequence in 
bin AI such that (V n (l),Y n , Z n ) are jointly typical for some 
I e [1 : 2™( 7 ( y ' s ) +e )]. Note that there can more than one V n {l) 
sequence which is jointly typical with (Y n ,Z n ), but this is 
not an error event as long as the Y n sequence in bin M is 
unique. We now give the details of this achievability scheme. 

Proof of achievability for Theorem Q] 
Codebook generation 

• Generate 2 n< " I< " V ' S ' >+ ^' i ' > ' > V n codewords according to 

n.i=ip( v d 

• For the entire set of possible Y" sequences, bin them 
uniformly at random to 2 nR bins, where R > I(V; S) — 
I(V;Z)+H(Y\Z,V), B(M). 

Encoding 

• Given s", the encoder looks for a v n sequence in the 
codebook such that (v n ,s n ) £ % ■ If there is more 
than one, it randomly picks one from the set of typical 
sequences. If there is none, it picks a random index from 



[1:2 



n/(V;S)+(e 
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• It then generates a n according to a, = f(vi,Si) for i G 
[1 : n]. 

• At the second encoder, it takes the output y" sequences 
and sends out the bin index M such that y" G B(M). 

Decoding 

• The decoder looks for the unique y n sequence such that 

(v n (l),y n ,z n ) G % {n) for some I G [1 : 2"( / < v '' 5 »] and 
y n G B(M). If there is none or more than one, it declares 
an error. 



the fact that there are at most 2 n( - H( - Y ^ Z ' V " >+ - Y n sequences 
which are jointly typical with a given typical (y n , z n ). (b) fol- 
lows from the fact that the codebook generation is independent 
of (S n ,Z n ). Therefore, for any fixed I, V n (l) is independent 
of Z". Hence, if R > I(V; S) - I(V; Z) + H{Y\Z, V) + 6(e), 

< 2~^ e) -> o, 

i=l 

as n — > oo. 

We now turn to the proof of converse for Theorem [T] 



Analysis of probability of error n £ £ £ ^, rn 

J J e J J F roof of converse for lneorem\l\ 

Define the following error events ^- / or ,n\ j e u - u t u u u-v* * 

fe Given a (n, 2 n ) code for which the probability of error 

£a '■= {(V n (L), Z n ,Y n ) 4l T^}, § oes to zero w ^ n aR d satisfies the cost constraint, define 



£i := {(V n (l),Z n ,Y n ) G T t {n) 

for some Y" ^ r™,Y" g 8(M)}. 



VS = (^"V,^,^- 1 ), we have 



i+ 
nR 

> H{M\Z 



By symmetry of the codebook generation, it suffices to 
consider M = 1. The probability of error is upper bounded = H ( M > _ #"0HM, Z n ) 

b y ( =' H(M,Y"\Z n ) 



ne ri 



2 



n(J(V 



s)+( e » = if(Y n |Z n )-ne„ 



P(£)<P(« + £ P(ft). = £ ff( r ( |r<-, Z »>-„ e „ 

i=l 

P(£o) - ► as n — > oo following standard analysis of probabil- n 

ity of error. It remains to analyze the second error term. Con- = H(Yi\Y i ~ 1 , Sl l +1 , Z n ) 

sider P(£i) and define ^(F n ,Z n ) := {(V n (l), Z n , Y n ) G i=i 

7; (n) for some F" ^ F n ,y™ G 6(1)}. We have ,A , , ., 

P(£ l ) = P(£ l (V n ,Z n )) i=i 
* — ^ n 

P(V n (l) = = z n )P(£ l (? n ,* n )K,z n ) V^HPIIY*- 1 ,^,!!") 

( t >", Z ")er e < " ) »=i 

£ ( p (^) - «"> ^ n = n + v z^- 1 ; s^ 1; in 

",^)6T e < " ) i=1 

\ n 

£ P(Y" = z") P(£i(v n , z n )\v n ,z n , y n ) ( = £ # (^F^ 1 , ST +lJ Z») 



J/« / 4=1 

(a) " 

< £ (P(v n (Q = « n ,z n = z n )- +^/(r i - 1 ,sr +1 ,z ,lV ;S,|z l )-n eri 

n n 

^P(y™ = y^\ v n }Z npn{H(Y\Z,V)+(e)-R) \ = ^ H(Yi\Vi, Z l ) + ^ Ity; Si\Z l ) - 



II" 



(h) 



^2 (P(V n (l) = v n )P(Z n = 

n(H(Y\Z,V)+(e)-R) 



i=l i=l 

ff(yg, |Vq, Q, Zq) + n/(F Q ; 5 Q |Q, Z Q ) - ne„ 



(•u n ,z»)e7? n) 



< | 2 n(H(V,Z)+(e)) 2 -n(H(V)-(e)) 2 -n(H(Z)~(e)) 



where (a) is due to Fano's inequality, (b) follows from Csiszar 
Sum. (c) holds because (S n , Z n ) is an i.i.d source. Note that 
the Markov conditions, Vi — (Si,Ai)— Yi and V i — S l -Z l hold. 
Finally, we introduce Q as the time sharing random variable, 
i.e., Q - Unif[l,...,n], and set V = (Vq,Q), Y = Y Q and 
^n(H(Y\zy)+(e)-R)\ s = Sq, which completes the proof. 

Remark: Note that the proof of converse continues to hold 
_ 2 n{H(Y\v,z)-i(V\Z)-R-i{n)) even if side information Z n is available at the compressor. 

This observation shows that side information Z n at the 
(a) follows since the set of Y n sequences are binned uni- compressor does not change the rate-cost tradeoff region in 
formly at random independent of other Y n sequences, and Theorem Q] 
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B. Lossless, causal compression with action 

Our next result gives the rate-cost function for the case of 
lossless, causal compression with action. 

Theorem 2 (Rate-cost function for lossless, causal case) 

The rate for the compression with action when the state 
information is causally available at the action encoder is 
given by 



R(B) 



min H(Y\V,Z) 

p(v),a=f(s,v):EA(S,A,Y)<B 



(5) 



where the joint distribution is of the form p(z, s,v,a,y) = 
p{z, s)p{v)lsf( S)V \— a \p{y\a, s). The cardinality ofV is upper 
bounded by \S\ +2. 

Achievability sketch: Here V simply serves as a time-sharing 
random variable. Fix a p(v) and f(s,v). We first generate a 
V n sequence and reveal it to the action encoder, the compres- 
sor and the decoder. The encoder generates Ai = f(Si,Vi). 
The compressor simply bins the set of Y n sequences to 
2n(H(Y\v,z)+e) ^ ms anc j sen( j s m e index of the bin which 

contains Y n . The decoder recovers Y n by finding the unique 
Y n sequence in bin M such that (V n ,Z n ,Y n ) are jointly 
typical. 

Remark: Just as in the non-causal case, the achievability 
is closely related to the channel coding strategy in J2], our 
achievability in this section uses the "Shannon Strategy" in 
fl4]. In both cases, the optimal channel coding strategy yield 
the most compressible output when the message rate goes to 
zero. 

Proof of Converse: Given a (n, 2 nR ) code that satisfies the 
constraints, define Vi = (S 1 ^ 1 , Z n \ l ). We have 



nR 



> H(M\Z n ) 

= H(M,Y n \Z n ) - H(Y n \M, Z n ) 

( = } H{M,Y n \Z n )-ne n 
= H(Y n \Z n )-ne n 

n 

= ^HiYilY^^^Z^-nen 



> HiYlY' 1 " 1 ,^- 1 , S*- 1 , Zi, Z^* 

i=l 
n 

- J2 H ( Yi \ At ~ 1 > st ~ 1 > z ^ znV 

i=l 
n 

i=l 

nH(Y Q \V Q ,Q, Z Q ) - ne„ 



(d) 



where (a) is due to Fano's inequality; (b) follows from the 
Markov chain Yi — (S i ~ 1 ,A i ~ 1 ,Z n )—Y i ~ 1 ; (c) follows since 
A 1 ^ 1 is a function of S* I_1 . Note that Ai is now a function 
of Si and Vi. Finally, we introduce Q as the time sharing 
random variable, i.e., Q ~ Unif [1, n]. Thus, by setting V = 
(Vq, Q) and Y = Yq, we have completed the proof. 

Remark: Note that the proof of converse continues to hold 
even if side information Z n is available at the compressor. 
This observation shows that side information Z n at the 



compressor does not change the rate-cost tradeoff region in 
Theorem [2] 



C. Examples 

1 ) No side information: In this subsection, we first consider 
an example with state sequence S n ~ i.i.d. Bern(l/2) and Z = 
0. We have two actions available, A = and A = 1. The cost 
constraint is on the frequency of action A = 1, EA < B. The 
channel output Y t = Si © Ai © Sjvi where © is the modulo 2 
sum and {Sjvi} are i.i.d. Bern(p) noise, p < 1/2. The example 
is illustrated in Fig. [3] 



■S n ~ i.i.d Bern(l/2) 

S'f, ~ i.i.d Bern(p) 



Action 
Encoder 



A" 



I 

<±> 





Compressor 


AI e 






{!,.., 2" R 



EA < B 

Fig. 3. Binary example with side information Z = 0. 

We use the following lemma to simplify the optimization 
problem in Eq. (0]i applied to the binary example. 

Lemma 1 For the binary example, it is without loss of 
optimality to have the following constraints when solving the 
optimization problem of Eq. @: 

• V = {0,1,2}, Pr(V = 0) = Pr(V = 1) = 9/2, for some 
9€ [0,1]. 

• The function a = f(s,v) is of the form: /(s,0) = s, 
f(s,l) = 1 - s and f(s,2) = 0. 

• Pr(5 = 0|V = 1) = Pr(S" = 1\V = 0) = A and 
Pr(5 = 0|V = 2) = 1/2. 

• A9 < B. 

Note that the constraints guarantee that Pr(5 = 0) = Pr(5 = 
1) = 1/2. 

Proof: See Appendix. ■ 
Using Lemma Q] we can simplify the objective function in 
Eq. dUi in the following way: 

H(Y\V)+I{V;S) 
= H (Y\V) - H(S\V) + H(S) 
= H(S®A®S N \V)-H(S\V) + l 

Sn\V = 0) - H(A)) 



{H(1®S N \V = 1)-H(A)} 



+ (1-8){H(S(BS N \V 
9 (H 2 (p) — -ff(A)) + 1 



2)-l} + l 



where H 2 (-) is the binary entropy function, i.e., H 2 (S) 
-SlogS - (1 - S) log(l-tf). 
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R(B) 



mm 

96(2-8, 1], 9A<1 



<(H 2 (p)-H(A)) 



D 



1+ min - (H 2 ( P ) ~ H 2 (A)) 
Ae[B,l/2] A 



1 — B max 

Ae[B,l/2] 



H 2 (A)-H 2 {p) 



1 - B 



H(b*)-H 2 {p) 



if < B < b* 



l-H 2 (B)+H 2 (p), if6*<B<l/2 w 



where b* is the solution of the following function: 

H 2 {b) - H 2 {p) dH 2 



b db ' 

which is illustrated in Fig. [4] 



b G [0,1/2] 



(7) 




D 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 



0.45 0.5 



Fig. 4. The threshold b* solves g2(l>> / 2(pl = ^3., b 6 [0, 1/2] 

Now let us shift our attention to the causal case of the 
binary example, i.e., Si is only causally available at the action 
encoder. 

Lemma 2 For the causal case of the binary example, it is 
without loss of optimality to have the following constraints 
when solving the optimization problem in Eq. (0: 

• V = {0, 1}, Pr(y = 0) = 6, for some 9 G [0, 1]. 

• The function a — f(s,v) is of the form: f(s,0) = s, 

/M) = o. 

. f <B. 

Proof: See Appendix. ■ 

R(B) 
= min H(Y\V) 

min 6H(Y\V = 0) + (l-6)H(Y\V = 1) 

ee[o,i],f <b 

min 9H(Z\V = 0) + (1 - 6)H (S © Z\V = 1) 

9g[0,l],f <B 

min 0H 2 (p) + (1 - 9) 

0e[o,i],§ <b 

( 2BH 2 (p) + (l-2B), 0<B<l/2; 
1 H 2 (p), 1/2 < B. 



For the binary example with p = 0.1, we plot the rate-cost 
function R(B) for both cases in Figure [5] Note that when 
S is only causally known at the action encoder, the optimum 
lossless compression scheme amounts to time sharing between 
compressing the noise Sn losslessly and compressing S 
losslessly. The optimum time sharing factor is determined by 
the cost B. 



R-B curve with p-0.1 




0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 



Fig. 5. Comparison between the non-causal and causal rate-cost functions. 
The parameter of the Bernoulli noise is set at 0.1. 



2) Erased side information: We now turn to the case when 
side information is available at the decoder only. We extend 
our setting in the previous example by letting Z be an erased 
version of S. That is, 

z _ f S w.p. 1 -p e 
\ e w.p. p e 

In this case, the rate cost function is related to the case when 
no side information is available at the decoder in a simple 
manner. We first note the following 

V{V\Z = e) = Y,nS,V\Z = e) 

s 

= J2P(S\Z = e)P(V\S 1 Z = e) 

= P(V). (8) 

The third equality follows from the Markov Chain V — S — Z. 
Furthermore, 

P(Y\V, Z = e)= Y,P{S\Z = e, V)p(Y\V, S,Z = e) 



^P{S\V)P{Y\V,S) 
P(Y\V). 



(9) 



The second equality follows from the Markov chain Z — 

(S, V) —Y and P(S\Z = e, V) = P(S, V, Z = e)/P(V, Z = 
e) = P{S,V)P(Z = e)/(P(Z = e)P{v)) = P{S\V). We 
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now consider the rate-cost expression when S is non-causally 
known at the action encoder. 



R(B)k C = mmI{V;S\Z) + H(Y\V,Z) 

^ mm Pe I(V;S)+ Pe H(Y\V) 
= p e mm(I(V;S)+H(Y\V))-t 



+ (l-p e )H(Y\V,S) 
(l-p e )H 2 (p). 

(i) 



(a) follows the following observations: (i) when Z = e, 
P(V\Z = e) = P(V) by ©, so H(V\Z = e) = H(V) and 
from the Markov Chain V - 5" - Z and P(5|Z = e) = P(5), 
.ff(V|S,Z = e) = Hence, S\Z = e) = 

I(V; S); and (ii) from ©, ff(y|V, Z = e) = H{Y\V). The 
last equality follows from when Z = S, H(Y\Z = S,V) = 
H{Y\S,V). Since A = f(S,V), S N ~ Bern(p) independent 
of (S, V, Z) and Y = S © A © SW, ff(y|S, V) = H 2 (p). 

As checks, note that when p e = 1, which corresponds to 
the no side information case, the rate-cost function reduces to 
that in Corollary 1, and when p e = 0, the rate-cost function 
reduces to H2(p), which corresponds to the minimum rate 
required when S is also available at the decoder. We now 
turn to the case when S is only causally known at the action 
encoder. Here, we have 

R(B) C = mmH(Y\V,Z) 

= p e mm H(Y\V) + (l-p e )H 2 (p). 

The rate-cost tradeoff is shown in figure |6] 

R-B curve with p=0.1, p =0.5 



0.65 - 




0.55 



0.45 



Fig. 6. Comparison between the non-causal and causal rate-cost functions 
with erased side information at the decoder. The parameter of the Bernoulli 
noise is set at 0.1 and the parameter of the erased side information is set at 
0.5. 



IV. Lossy compression with actions 

In this section, we extend our setup to the lossy case. We 
first consider the case when side information is available at 
both the compressor and the decoder. We characterize the 
rate-distortion-cost tradeoff region for the case when S n is 
causally known to the action encoder. The case when S n is 
non-causally know at the action encoder is more involved. We 
give an achievable rate-distortion-cost region for that setting. 



We then move on to the case when side information Z is 
available at the decoder only, and S is non-causally known at 
the action encoder. We discuss two achievability schemes for 
this setting. 

I 

A. Side information known at compressor and decoder 

Theorem 3 The rate-cost-distortion function for the case with 
causal state information and side information available to both 
the compressor and the decoder is given by 



R(B,D)= min I(Y;Y\V,Z) 

a=f(s,v):EA(S,A,Y)<B,Ed(Y,Y)<D 

(10) 

where the joint distribution is of the form p{z, s, v, a, y, y) = 
p(s,z)p(v)l{ a= f( SiV )yp(y\a,s)p(y\y,v,z). The cardinality of 
V is upper bounded by \S\ + 2. 

Achievability sketch: The achievability is straightforward, 
with V n acting as the time sharing random variable known to 
all parties. We first generate V n ~ YYi=iP( v i)- F° r each z n 
sequence, we generate 2"' / ' y;y l^ z ^ +e ' Y n sequences accord- 
ing to n"=i-P(yi| u «> z i)- Th e action encoder simply generates 
a n according to <Zj = f(vi, Sj) for i £ [1 : n]. The compressor 
looks for a y n sequence such that (y n , y n , v n , z n ) € T e ■ 
It then sends out this description to the decoder which re- 
constructs Y n as y n . Since we have 2"( / ^ y l y ' Z }+«) Y n 
sequences, the probability of not finding a jointly typical Y n 
sequence goes to zero as n — » oo. 

Converse: Given a (n, 2 nR ) code satisfying the cost and 
distortion conditions, we have 

nR > H(M\Z n ) 
> I(M;Y n \Z n ) 

n 

^ I(M; Yi \Y i ~ 1 ,Z n \ i , Z l ) 



i=l 
n 

i=l 



I(M]Yi\Vi,Zi) 



(6) 
> 



(c) 



=i 

d(Y Q ;Y Q \V Q ,Q,Z Q 



where in (a) we set V t = (F 1 " 1 , Z n ^). (b) holds from 
the fact that Yi is a function of M and Z n . In (c) we 
introduce Q as the time sharing random variable, i.e., Q ~ 
Unif[l,...,n]. Thus, by setting V = {Vq,Q) and Y = 
Yq and noting that Note that V is independent of (S, Z), 
we have shown that R{B,D) > I(Y,Y\V,Z) for some 
p(v)p(s\v)p(z\s)p(a\s,v)p(y\a,s)p(y\y,a,s,z,v). It suffices 
to restrict attention to the joint distribution stated in Theorem 
13 because of the following observations. 

• p(y\a, y, v, s, z) can be restricted to p(y\y, v, z) since the 
mutual information term I(Y;Y\V, Z) and the distor- 
tion constraint only depend on the marginal distribution 

p(y,y,v,z). 

• p(a\s,v) can be restricted to a = f(s,v) since we 
can always find an independent random variable U 
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such that p(a\s,v) = p(s,v)p(u)l a —ff StViU y Now de- 
fine V = (V,U) and p(y\y,v,z) = p(y\y,v,z). 
Note that p(s,z)p(v)l{ a= f^)}p(y\a,s)p(y\y,v,z) = 
p(s, z)p(v)p(a\s,v)p(y\a, s)p(y\y,v, z). Since the joint 
distribution remains unchanged, the distortion and cost 
are preserved. As for the rate, we note that 

I(Y; Y\V, Z) = H(Y\V, Z) - H(Y\V, Z, Y) 

= H(Y\V,U,Z)-H(Y\V,U,Z,Y) 
< H(Y\V, Z) - H(Y\V, Z, Y) 
= I(Y;Y\V,Z). 

Our next Theorem gives an upper bound on the rate- 
distortion-cost tradeoff for the case when the state information 
is known non-causally at the action encoder and Z" is present 
at both the compressor and the decoder. 

Theorem 4 An upper bound on the rate-distortion-cost func- 
tion for the case with non-causal state information and side 
information at both the compressor and decoder is given by 

R(B,D)< min I(V; S\Z)+I(Y; Y\V, Z) 

EA(S.A.Y)<B,Ed{Y,Y)<D 

(ID 

where the joint distribution is of the form p{s, v, a, y, y, z) = 

p(s, z)p(v\s)l {fis ^ v)=a} p(y\a, s)p(y\y, v, z). 

Sketch of achiev ability: 

We generate 2"W y ' 5 '+ £ ) V n (l ), l £ [1 : 2 n< ~ I ( v ^+% 
sequences according to YYi =1 p{vi), and for each v n (l ) 
and z n , generate 2 n W f ^\ v ^+^ Y n (l ,h), l x £ [1 : 
2n(i(Y;Y\v,z)+e)^ sequences according to Y[i=iP(Vi\ v i^ z i)- 
The set of v n sequences are then randomly binned to 

2 n(I(V-S\Z)+2c) bmSi m £ [! . 2 n(I(V;S\Z)+2^ Given 

a sequence s n , the action encoder finds the v n sequence 
which is jointly typical with s" and takes actions according 
to dj = f(si,Vi) for i £ [1 : n\. At the compressor, we first 
find a v n (lo) that is jointly typical with (y n ,z n ) and then, a 
T{hM) such that (v n (l ),y n (l ,h),y n ,z n ) £ % (n) . Note 
that there exists at least one v n (lo) that is jointly typical with 
(y n ,z n ) with high probability since the true v" sequence is 
jointly typical with (y n ,z n ) with high probability. If there is 
more than one such sequence, the compressor chooses one 
uniformly at random from the set of v n sequences jointly 
typical with (y n ,z n ). The compressor then sends the indices 
m and l\ such that the selected v n (lo) £ B(m). The decoder 
recovers v n (lo) by looking for the unique lo £ B(m) such that 
(v n (l ),z n ) £ 7^ (n) . It reconstructs Y n as y(l Ji). From the 
rates given, it is easy to see that all encoding and decoding 
steps succeed with high probability as n —> oo. 

B. Side information available at the decoder only 

When the side information is available at the decoder only 
and S is know non-causally at the action encoder, we discuss 
two possible achievability schemes. The first scheme is a 
generalization of the achievability scheme of Theorem 1 to 
the lossy case. 



Theorem 5 An upper bound on the rate-distortion-cost func- 
tion for the case with non-causal state information and side 
information at the decoder is given by 

R{B, D) = I(V; S\Z) + I(U; Y\V, Z) 

for some p(z, s)p(v\s)l a=f ^ s ^p(y\a, s)p{u\y) satisfying 

EA(A,S,Y) < B, 

Ed(Y,Y(Z, U)) < D. 

Theorem [5] is generalization of Theorem 1 since if we let U = 
Y, we recover Theorem 1 . 

Proof: As the achievability scheme is an extension of the 
achievability scheme for Theorem 1, we will only mention the 
additional steps in the proof of achievability. 

Codebook Generation 

The additional step in the codebook generation procedure 
is in the generation of a codebook of U n covering sequences 
to cover Y n and a binning or compression codebook for 
the U n sequences. We first generate 2 n ^ I( - u ^+^ U n (h) 
sequences according to n™=iP( w *)- We then bin the set of 
U n sequences into 2 n( - I( - v ^+ I( - u - Y ^ z '>+ 5 ^ bins, B(M), 
M £ [1 : 2™( 7 ( y ' s, l z )+ / ( ,7 ' y l^ z )+ 5e )]. 

Encoding 

The encoding procedure for the action encoder remains the 
same as that in Theorem 1. For the compressor, it first looks 
for a u n (h) such that (u n (h), y n ) £ V n) . It then sends out 
the index to, such that u n (h) £ B(m). 

Decoding and analysis of probability of error 

The decoder looks for the unique u n (li) £ B(m) such that 
(v n (l ),y n ,z n ,u n (h)) £ Tc (n) for some l £ [1 : 2 n ^^ v ^ +e }. 
For the analysis of probability of error, let Lq and L\ be 
the indices picked by the action encoder and the compres- 
sor respectively. Following the rates given in the codebook 
generation and encoding procedure, the covering lemma lUTI 
Chapter 3] and the strong Markov lemma |[TT] Chapter 12], it 
is easy to see that P(V n {L ), Y n , , Z n , U n (Li) £ % (n) ) 1 
as n —> oo. The other "error" event of interest is now the 
following. 

Su ■= {(V n (l ),Z n ,U n (h)) £T t (n) 

for some h ^ L^U'ih) £ B(M), 
l £\l: 2 n ^ v -^]}. 

Due to the symmetry of the binning process, we can assume 
without loss of generality, M = 1. Define £i (V n , Z n ) to be 
the event 

£i (V n 7 Z n ) := {(V n (l Q ), Z n , U n (h)) £ T} n) 

for some h ^ L u U n (h) £ 6(1)}. 

Then, P(Eu) is upper bounded by 

P(Eu) <J2P(£ lo (V n ,Z n )). (12) 
lo 
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We now give a bound for P(£i (V n , Z n )). We first have for some p(z,s)p(v\s)l a —ft 8>v \p(y\a,s)p{u\y,v) satisfying 



P(£ lo (V n ,Z n )) 



E 



p v> z{v n ,z n ). 

P(£i (V n ,Z n )\V n = v n ,Z n = z n ) 

pv,z(v n ,z n ). 
P(£ k (v n ,z n )\v n ,z n ) 



E 



(«™,2")6T e 1 ' 

Next, let £i (j,v n ,z n ^ be the error event 

-in 



{(v n (i ),z n ,u n (j)) e.V2,Li + j,u n (j) e B(i)}. 

Then, 
have 



Then, £ lo (v n ,z n ) C U^ V>Y)+e) £i (j,v n ,z n ). We therefore 



P(£ lo (v n ,z n \v n ,z n )) 

2 n(I(U,Y) + e) 

< y, m (j,v n ,z n )K,z n ) 

< £ P((v n ,z n ,U n (j)) e T^\u n {j) g B(l)\v n ,z n ) 

2 n(/(I/!V)+«) 

V f P((i.",z n ,[/"(j))e7; ( "> n ,z") 
~ V P(^ n 0') e B(l)K,z n ) 

< on(/(l/;y)+e) 2~ n ( I ( u iZ,V)-e) 2-n(I(,V;S\Z)+I{U;Y\V,Z)+5e) 
_ 2-n(I(V;S\Z)+3e) 

We therefore have 

P(£ lo (V n ,Z n )) 

< J2 p ( y "0o) = « n , = z ").2-"( / ( v '' s l z )+ 3e ) 

< 2-n(I(V;Z)-e) 2-n(I(V;S\Z)+3e) 

Hence, from (fTZt , 

P(Eu) < 2 n ( I{v '' S)+e) 2- n( - I( - V ' Z) - t) .2- n( - I( - V ' S \ z)+3t) 
= 2 _ne . 

Therefore, P(Eu) — > as n — > oo. 

Since the probability of "error" goes to zero as n —> oo, 
the expected distortion of the reconstruction % = y(zi,Ui), 
i G [1 : n], is less than or equal to D as n — > oo. ■ 

The achievability scheme in Theorem[5]restricts the descrip- 
tion of Y n that is sent, U n , to be independent of V n given Y n . 
This is a result of not requiring the compressor to decode the 
true V n codeword that was selected by the action encoder. In 
our next scheme, we remove the Markov condition, U — Y—V, 
by making the compressor decode V n . This operation results 
in a different restriction on the allowable joint probability 
distribution, I(V; Y) > I(V; S). 

Theorem 6 An upper bound on the rate-distortion-cost func- 
tion for the case with non-causal state information and side 
information at the decoder is given by 

R(B, D) = T(V; S\Z) + I(U; Y\V, Z) 



EA(A,S,Y) < B, 
Ed(Y,Y(Z,U)) < D, 

I(V;S)<I(V;Y). 

Sketch of achievability 

We generate 2™( / (^ s )+ £ ) V n (l ), l G [1 : 2 n( - I( - v ^+% 
sequences according to Yi^iP^i)' an< ^ f° r eacrl w ™('o) 5 
generate 2 n W u ' Y \ v ^ U n (l ,h), h G [1 : 2 n ^^ Y \ v ^+% 
sequences according to Yl?=i p( u i\ v i)- The set of V n se- 
quences are partitioned to 2™( / ( l/;S l z )+ 2e ) bins, B(mo), mo G 
[1 : 2"( / ( y;S l z ) +2e )], while the set of U n sequences are 
partitioned to 2 n{I ^ u ' Y \ v ^+ 2 ^ bins, B(m ), m G [1 : 
2n(i(U;Y\v,z)+2e)^ Qj ven a se q U ence s", the action encoder 
finds the v n sequence which is jointly typical with s" and 
takes actions according to a, = f(si,Vi) for i G [1 : n]. 
At the compressor, we first find v n (lo) by joint typicality 
decoding. It can be shown that this decoding procedure 
succeeds with high probability provided I(V; Y) > I(V; S) 
1 1 2|. Next, the compressor looks for a u n (lo,h) such that 
(v n (lo), u n (lo, h), y n , z n ) G 7^ ( '™' > .The compressor then sends 
the indices mo and mi such that v n (lo) G B(mo) and 
u n (l(),h) G B(mi). The decoding operation now follows 
standard Wyner-Ziv decoding. The decoder first recovers 
v n (lo) by looking for the unique Iq G B(mo) such that 
(v n (lo),z n ) G Te ■ Next, it recovers u n (lo, h) by looking for 
the unique u n (l ,h) such that (v n (io),u n (lo,h),z n ) G % {n] ■ 
It reconstructs Y n as yi(vi(lo), Ui(lo , h), z i) f° r i G [1 : n]. 
From the rates given, the encoding and decoding steps succeed 
with high probability as n — » oo. 



V. Conclusion and future directions 

In this paper, we consider a variation of lossy and lossless 
compression where, instead of compressing the original source, 
we take actions to modify the source before compression, sub- 
ject to a cost constraint. In the lossless case, we characterize 
the rate-cost tradeoff for several different cases, including the 
cases where side information is available at the decoder only, 
and where the original source S n is either known causally 
or non-causally at the action encoder. We then extended 
the analysis to the lossy case, where we characterize the 
rate-distortion-cost tradeoff for the case where S n is know 
only causally and side information is available at both the 
compressor and the decoder. 

Our setting can be extended in several different directions. 
One possible extension is to consider the case of message 
embedding, where we desire to send a message together with 
conveying information about Y n . Another extension that may 
be of interest is to consider the case where we have distributed 
state information Si and S2 which are correlated at two 
different action encoders. We are still interested in the output 
Y, but the additional dimension in this extension is in how 
the two distributed action encoders can coordinate to generate 
an output Y n that is as compressible as possible. 
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Appendix 

A. Proof of Lemma Q] 

Fixing a v, the function a = f(s,v) has only four possible 
forms: a = s, a = 1 — s, a = and a = 1. Thus, we can 
divide V into four groups: 

V = {v:f(s,v) = s} 

Vi = {v. f(s,v) = l-s} 

V 2 = {«:/(«,«) = 0} 

V 3 = {v: f(s,v) = l} 



(13) 



First, it is without loss of optimality to set V3 = 0. That 
is because for each v G V3, we can change the function to 
f(s, v) = 0. The rate I(V; S) + H{Y\V) does not change and 
the cost EA only decreases. 



Rewrite the objective function in the following way 

I(V;S) + H(Y\V) 
= H(Y\V)-H(S\V)+H(S) 
= H(S®A®Z\V)-H(S\V) + H(S) 
= '$r(H i (p)-H(S\V = v))p(v) 



(14) 



vev 



+ {H 2 (p)~H(S\V = v))p(v) 
veVi 

+ ( H ( s ®S N \V = v)- H(S\V = v))p(v) 

v£V 2 

where the last step is obtained by plugging in the actual form 
of a = f(s, v) for each group of v. 

Second, it is sufficient to have |Vo| = 1 and |Vi| = 1. To 
prove this, let vi, v 2 G Vrj. Note that H(S\V = v) is a concave 
function in p(s\ V = v). Thus if we replace Vi, v 2 by a W3 with 
P{v3) =p(vi) + p(v 2 ) and 



p(s\V = v 3 ) 



P{vi) 



p(vi) + p(v 2 ) 

P(V2 



p{s\V = ui) 

p(s\V = v 2 ), 



p(vi) + p(v 2 ) 

we preserve the distribution of S, the cost EA but we reduce 

the first term, i.e., J2vev { H ^(p) - H ( S \ V = v ))p( v )> in 
Eq. (fT~4-b . Therefore, we can set Vo = {0} and Vi = {1}. 
Third note that for each v G V 2 , 

H(Y\V = v) - H(S\V = v) 
= H {S © A © Z\V = v) - H(S\V = v) 
= H(S © S N \V = v) - H(S\V = v) 
> (15) 

Last, if Pr(5 = 0\V = 0) ^ Pr(S = 1\V = 1), consider a 
new auxiliary random variable V' with the following distribu- 
tion: 

• V = {0,1,2}, Pr(V = 0) = Pr(V = 1) = (Pr(V = 
0) + Pr(V = l))/2 

• The function a = f(s,v') is of the form: /(s, 0) = s, 
/(s,l) = 1 - s and /(s,2) = 0. 

• Pr(5 = 0\V = 2) = 1/2 and 

Pr(5 = 1\V = 0) = Pr(5 = 0\V = 1) 
PrQg = 1\V = 0)PijV = 0) + Pr(5 = 0\V = l)Pi(V = 1) 
~Pr(V = 0) +Pr(y = 1) 

Comparing (S,V) with (S,V), we can check that the cost 
EA and the distribution of S are preserved. Meanwhile, the 
objective function is reduced, which completes the proof. 



B. Proof of Lemma [2] 

Similar to the proof of Lemma [TJ we divide V in to 
Vo, Vi,V 2 , V3. Using the same argument, we show that V3 = 0. 
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Rewrite the objective function H(Y\V) in the following way: 

H(Y\V) (16) 
= H(S® A®S N \V) 

vev 

+ ^ H 2 (p)p(v) 

+ {H(S®S N \V = v)p(v) 
vev 2 

= H 2 (P) + H p^> 

which implies that it is sufficient to consider the case |Vo| = 1, 
Vi = and | V2 1 = 1. And this completes the proof. 



